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Abstract 

This article concerns the dimension reduction in regression for large 
dataset. We introduce a new method based on the sliced inverse regres- 
sion approach, called cluster-based regularized sliced inverse regression. 
Our method not only keeps the merit of considering both response and 
predictors information, but also enhances the capability of handling 
highly correlated variables. It is j ustified under certain line arity condi- 



tions. An empirical application on lStock and WatsonI (j201ll ) macroeco- 



nomic dataset shows that our method outperformed the dynamic factor 
model and other shrinkage methods. 
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1 Introduction 



Forecasting using many predictors has received a good deal of attention in 
recent years. The curse of dimensionahty has been turned into a blessing 
with the abundant information in large datasets. Various methods have been 
originated to extract efficient predictors, for example, dynamic factor model 
(DFM), Bayesian model averaging. Lasso, boosting, etc. Among them, dy- 
namic factor model is conceptually appealing in macroeconomics because it is 
structurally consistent with log-linearlized models such as dynamic stochastic 
ge neral equilibrium mod els . 



Boivin and Ngl (120051 ) assessed the extent to which the forecasts are influ- 



enced by how the factors are estimated and/or how the forecasts are formu- 
lated. They did not find one method that always stands out to b e systemat- 
ically good or bad. Meta-study from lEickmeier and Zieglei (120081) also found 



mixed performance of DFM forecasts. iStock and Watson! (120111 ) compared 
the dynamic factor model with some recent multi-predictor methods. They 
concluded that the dynamic factor model could not be outperformed by these 
methods for all the forecasting series in their dataset. 

The recent development in statistics provides a new method of dimension 
reduction in r egress ion for lar ge-dim ensioned data. The literature stems from 



Duan and Li 



( 1l99ll ). and iLj (1l991al ). which proposed a new way of thinking 
in the regression analysis, called sliced inverse regression (SIR). SIR reverses 
the role of response y and predictors x. Classical regression methods mainly 
deal with the conditional density f(y|x). SIR collects the information of the 
variation of predictors x along with the change of the response by exploring 
the conditional density /i(x|?/). Usually the dimension of the response is far 
more less than the dimension of the predictors, hence, it is a way to avoid the 
"curse of dimensionality" . 

The traditional SIR does not work well for highly correlated data, because 
the algorithm requires the inverse of the covariance matrix. This is not feasible 
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when the number of variables is greater than the number of observations T, 
which happens a lot in economics studies. In addition, the economic variables 
are often highly correlated, due to the derivation formula or same category. 
This makes the covariance matrix ill-conditioned, causes the inverse matrix 
lack of precision and too sensitive to the variation of matrix entries, and leads 
to a false or unstable result. There are some extensions of SIR for the highly 
collinearity data and "T < A^" problems, for example, r egularized s li ced in - 



Li and YinI (120081 )) 



verse regression (jZhong. Zeng. Ma. Liu, and Zhul (120051) , 
and partial inverse regression (ILi. Cook, and Tsail (120071 )). 

In this paper, we propose a new method of dimension reduction, called the 
cluster-based sliced inverse regression (CRSIR) method, for many predictors 
in a data rich environment. We evaluate its property theoretically and use it 
for forecasting macroeconomic series. Comparison in terms of pseudo out-of- 
sample forecasting simulation shows the advantage of our method. 

The remaining of the paper is organized as follows. Section 2 introduces 
cluster-based SIR method with its statistic al property. An erapirical applica- 
tion on the macroeconomic dataset used by IStock and WatsonI (120111 ) is given 



in Section 3. Conclusions with some discussions are provided in Section 4. 



2 Modeling and methods 



The regression model in |Lj (jl991al ) takes the form of 



y = 9if3[^, /32X, . . . , e) 



(2.i: 



where the response y is univariate, x is an A^- dimensional vector, and the 
random error e is independent of x. Figure [T] gives a straightforward illus- 
tration of Model (12.11) . which means that y depends on x only through the 
i^'-dimensional subspace spanned by projection vectors . . . , /3 y , know n as 
the effective dimension reducing directions (e.d.r. -directions) (jLj (jl991al )). 
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Many methods can be used to find the e.d.r. -directions, for example, prin- 
cipal component analysis might be the most commonly used one in economics. 
But unlike these methods, SIR not only reduces dimensions in regression but 
also integrates the information from both predictors and response. Moreover, 
different from the classical regression methods, SIR intends to collect infor- 
mation on how X changes along with y. That is to say, instead of estimating 
the forward regression function T7(x) = E{y\x), inverse regression considers 
^(y) = E{^\y). Compared with r7(x), the inverse regression function ^(y) 
depend s on on e-dimensioned y, which makes the operation much easier. 



Li 



(Il991al ) showed that using SIR method, the e.d.r.-directions can be 



estimated by solving 

Cov(E(x||/))^^. = z/,Cov(x)/3^, (2.2) 

where Vj is the jth eigenvalue and (3^ is the corresponding eigenvector of 
Cov(£'(x||/)) with respect to Cov(x). During the forecasting procedure, the 
covariance matrices can be replaced by their usual moment estimates. 
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2.1 Cluster-based sliced inverse regression 

In this section, we introduce clustering methodology with the sliced inverse 
regression to improve the performance of SIR on coUinear data. 

Assume that the variables of interest can be clustered into several blocks, 
so that two variables within the same block are correlated to each other, and 
any two variables belonging to different blocks are independent. In practice, an 
orthogonalization procedure can be applied to reduce the correlations between 
blocks in order to fit our assumption. Thus, we can cluster the variables 
according to their correlations in order to find the e.d.r-directions, because 
there is no shared information between clusters. 



The clustering method we use is hierarchical clustering (IWardI (Il963[ )) with 
complete linkage. The dissimilarity is defined as 1 — | Correlation |. 

The algorithm for the cluster-based SIR method can be described as fol- 
lowing. 

1. Standardize each explanatory variable to zero mean and unit variance. 

2. Cluster x (A^ x 1) into ( xi' ■ ■ ■ Xc' )' based on the correlation matrix 
of X, where Xi is Aj x 1, X]i=i ^« ~ number of clusters, 
which will be determined by cross-validation. 

3. Restricted to each cluster, perform SIR method and pick up ki SIR 
directions based on the sequential chi-square test (Li, 1991), say Q^^\ j = 
1, . . . , /cj, i 1, . . . , c. 

4. Collect all the SIR variates obtained from the clusters, say {0j*'"xj, i = 

1; 2, . . . , C, J = 1,2,..., fcj}. 

5. Let \i = ( Oi' ef O2' )', / = 1, • • • m = ^2^=1 where Oi and 
O2 are zero column vectors with dimension YlV=i-^k and Yl\=i+i-^kj 
respectively. Denote A = (Ai, A2, . . . , A^). The variates {O^-^'y^i] can be 
written in a vector form as (A'^^x, . . . , A^x)' = A'x. 
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6. Then, perform SIR method one more time to the pooled variates A'x to 
reduce dimensions further, and get the e.d.r. -directions (71,72, • • • ^1v)^ 
where v is also determined by the sequential chi-square test. Denote 
r = (7i, 72, • • • , 7^), the final CRSIR variates we chose are F'A'x. 

7. Estimate the values of forecasting series using the CRSIR variates F'A'x. 
Linear models are used in this paper. 

Note that the matrices F is m x t>, A is x m, so F'A'x is f x 1. Therefore, we 
only use v factors to build the final model for forecasting y, instead of using 
variables based on the original dataset. 



2.2 Statistical property of cluster-based SIR 



Lil (ll991aj) established the unbiasedness for the e.d.r. -directions found by SIR, 
assuming the following linearity condition. 

Linearity Condition 1. For any b G M^, the conditional expectation i?(b'x| 
/3'iX, . . . , /3'^x) is linear in /S'^x, . . . , /3'^x. 



The linearity condition is not easy to verify, however, lEatonl (119861 ) showed 
when X is elliptically symmetrically distributed, for example, mult ivariate nor- 



Hall and Li 



mally distributed, the linearity condition holds. Furthermore 
( I1993I ) showed that elliptical symmetric distribution is not a restrictive as- 
sumption, because the linearity condition holds approximately when A^ is large 
even if the dataset is not elliptically symmetric. 

Without loss of generality, we assume each variable in x has bee n stan- 



Lil ( 1l991ah proved 



dardized to zero mean and unit variance for our discussion, 
the following theorem. 

Theorem 1. Assume Linearity ConditionUl the centered inverse regression 
curve E{'x\y) is contained in the space spanned by T,y^(3j, j = 1, . . . , K , where 
Ex is the covariance matrix of ^. 
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Figure E] shows a three-dimensional case when x = (zi, X2, X3)', since the 
inverse regression function E{'x\y) is a function of y, it draws a curve in the 
three-dimensional space when y changes. Theorem [1] indicates that such curve 
is located exactly on the plane spanned by two directions di and d2 from 
Sx/3j5 j = 1,2, assuming K = 2. 

Similar unbiasedness property can be proved for our cluster-based SIR. 

Theorem 2. Under certain linearity conditions, E{'x\y) is contained in the 
space spanned by SxAF. 

Theorem [2] describes the desirable property that there is no estimation bias. 
The e.d.r.-space estimated by our CRSIR method contains the true inverse 
regression curve. The details of the proof are provided in the Appendix. 
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2.3 Orthogonalization 

For a given dataset X with dimension N x T, and clusters Xi, . . . , Xc, the 
correlations between these clusters need to be reduced to zero, to achieve 
cluster-wise independence. QR decomposition along with projection operators 
is used to perform the orthogonalization. 

To begin with, use QR decomposition to find the orthogonal bases of the 
first cluster Xi, named as Qi. Next, project the second cluster X2 onto the 
space of span{Qi}^, which is the orthogonal complement of the space spanned 
by Xi, named as X2*, 

X2* = (I-QiQi')X2. (2.3) 

Then use QR decomposition again to find the orthogonal bases of X2*, 
named as Q2, and project X3 onto the space of span{Qi, Q2}"'", named as X3*. 
Keep doing such process till the last cluster Xc, we will get a new sequence of 
clusters Xi, X2*, . . . , Xc*, in which every two clusters are orthogonal, and the 
new sequence contains all the information of the original dataset X. 



2.4 Regular izat ion 



Due to the high correlations between the series within each cluster, the covari- 
ance matrices of each cluster Sx; are ill-conditioned, which make them hard 
to be inversed. We suggest a regular ized version of the covariance matrix to 
overcome this issue (IFriedmanl (119891 )). 

trS^ 



1 - r 



^' T 



(2.4) 



where r G [0 , 1] is the shri r ikage parameter. This is similar to the ridge version 
proposed by IZhong et al.l (120051 ) . which replaces Sx; with Sx; + tI^-. 

The shrinkage parameter r can be chosen by cross-validation. Note when 
r = 1, the regularized covariance matrix will degenerate to a diagonal matrix 
whose diagonal elements are the means of the eigenvalues of Sxj. In such case. 
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the chosen e.d.r.-direction is one of the input series, and the other series, which 
may also contain information for the predictors, are discarded. 



2.5 Comparison between CRSIR and SIR 

Before applying the proposed CRSIR method to real data, consider the fol- 
lowing simulated example first to compare the performance of CRSIR and SIR 
methods. 

Let xi, X2, ■ ■ ■ , xio be independent and identically distributed (i.i.d.) with 
multivariate normal distribution A^(0, S), where S is a 10 x 10 covariance 
matrix with 1 at diagonal and 0.9 at off-diagonal. The random error e is 
independent to Xi's, and follows normal distribution A^(0,0.1). 

The response y is simulated using the following formula. 



Root mean square error (RMSE) is considered as a criterion to evaluate 
the prediction performance. 



where iji is the ith predicted value of response, yi is the iih. observed value, 
and T is the number of observations. 

We simulate 300 observations at each run under above conditions. In CR- 
SIR, the parameters c and r are chosen as c = 10, r = 0.5 to minimize RMSE. 
Table [1] presents the means and standard deviations for the RMSE of CRSIR 
and SIR across 100 runs. 

From Table [Tj it is clear that CRSIR has much smaller RMSE than SIR. 
In fact, our other simulations, which are not presented here, show that CRSIR 
performs even better when the sample size T decreases to A^. 



10 

y = ^3 X Xj + e 




(2.5) 
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Table 1: RMSE for CRSIR and SIR 





CRSIR 


SIR 


Mean 


11.73 


17.04 


S.D. 


0.81 


1.95 



3 Empirical application 



3.1 Dataset and method 



The dataset we use is IStock and WatsonI (120 111 ) dataset, which contains 143 
quarterly macroeconomic variables from 13 economic categories. We use 109 of 
them as explanatory variables, since the other 34 are just high-level aggregates 
of the 109. All 143 variables are used for forecasting purpose. 

The correlation plot of the 109 predictor series after logarithm and/or 
differencing transformation is showed in Figure |3l which demonstrates that 
there do exist some highly correlated blocks. Therefore, our cluster-based 
method is necessary for this dataset. 

For the purpose of comparison, similar rolling pseudo out-of-sample fore- 
casting simulation as in lStock and WatsonI (l201l[ l is used. The main steps can 
be described as follows, 



Stock and Watson 



torn Table B.2 to trans- 



1. Use the formula given by 
form all the series and screen for outliers. 

2. From 1985 to 2008, at each date t, use cross-validation, which is described 
below, to the most recent 100 observations to choose the parameter c and 
r in CRSIR based on mean square error. 



3. Use the chosen c and f, apply CRSIR one more time to predict yt+h, 
where h is the forecasting period. 
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4. Calculate the RMSE for the forecasting procedure, 



T 



RMSE = . Y^iy^+h-vt+hY/T. 



The steps for cross-validation are described as follows, 

1. Regress yt+h and Xt on the autoregressive terms 1, yt, yt-i, yt-2, Ut-s, in 
order to eliminate the autoregressive effect. Denote the residuals as yt+h 
and Xf. 

2. Let Q{t) = {1, • • • , t - 2/i - 3, i + 2/i + 3, • • • , 100}, at each date t = 
1, • • • , 100 — /i, find the e.d.r-directions and linear regression model using 
CRSIR and observations and Xi, i & ^{t). 

3. Use the e.d.r-directions and linear regression model from the above step 
at date t to predict yt+h- 

4. For fixed h, parameters (c, r) are chosen by minimizing the sum of 
squared forecasting error. 



We compare our method with the dynamic factor model using the first five 
principle components (DFM-5), which was claimed to be no worse than any 
other shrinkage methods according to Stock and Watson (2011). Autoregres- 
sive model of order 4 (AR(4)) is used as a benchmark, all RMSEs are recorded 
as the ratio relative to AR(4). Smaller relative RMSE indicates better fore- 
casting performance. 




100-/1 



t=i 



3.2 
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Figure 3: Plot of Correlations of the 109 Predictor Series 
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Table [2] presents the number of series with smaller RMSE than AR(4) 
model for CRSIR and DFM-5. We can see that for forecasting period h = 1, 
if CRSIR is used, there are 97 series out of 143 have smaller RMSE than the 
benchmark AR(4) model. If DFM-5 is used, only 85 series out of 143 have 
smaller RMSE than AR(4) model. The differences become even larger for big 
forecasting period, when h = 4 the number of series of CRSIR increases to 115 
while the number of DFM-5 decreases to 53. 

Table 2: Number of Series with Smaller RMSE than AR(4) Model 





DFM-5 


CRSIR 


h = 


1 


85 


97 


h = 


2 


59 


109 


h = 


4 


53 


115 



Table [3] presents the distributions of the RMSEs for AR(4), DFM-5, and 
CRSIR methods. When h = 1, the first quartile of the relative RMSE of 
CRSIR is just 0.768, which is much smaller than the relative RMSE of DFM-5 
(0.961), and the median relative RMSE of CRSIR is 0.907, while DFM-5 has 
0.993. When h = 2 and h = A, CRSIR improves the forecasting results of 
AR(4) for more than 3/4 of the series. The relative RMSE of CRSIR at first, 
second, and third quartile are all smaller than those of DFM-5. 

From Table [2] and [3l one can tell that CRSIR improves the forecasting 
results significantly compared to the DFM-5 method, especially for longer 
forecasting period. 

Table Hlpresents the median RMSE relative to AR(4) model by category via 
cross-validation. Column "S&W" reports the smallest relative RMSE Stock 
and Watson got using DFM-5 and other shrinkage methods in their 2011 pa- 
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Table 3: Distributions of Relative RMSEs by Pseudo Out-of-Sample Forecast- 
ing 



(I) /i = 1 



Method 


Percentiles 

0.050 0.250 0.500 0.750 0.950 


AR(4) 

DFM-5 

CRSIR 


1.000 1.000 1.000 1.000 1.000 
0.874 0.961 0.993 1.022 1.089 
0.621 0.768 0.907 1.048 1.372 


(II) h^2 


Method 


Percentiles 
0.050 0.250 0.500 0.750 0.950 


AR(4) 

DFM-5 

CRSIR 


1.000 1.000 1.000 1.000 1.000 
0.882 0.976 1.010 1.044 1.125 
0.652 0.759 0.865 0.991 1.186 


(III) /i = 4 


Method 


Percentiles 

0.050 0.250 0.500 0.750 0.950 


AR(4) 

DFM-5 

CRSIR 


1.000 1.000 1.000 1.000 1.000 
0.903 0.980 1.020 1.058 1.138 
0.648 0.736 0.827 0.940 1.220 
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Figure 4: Plots of the Forecasting Values (A) vs. Real Observations (o) from 
1985 to 2008. 
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Table 4: Median Relative RMSE for Forecasting by Category of Series 



Category 


DFM-5 


h = 1 
S&W 


CRSIR 


DFM-5 


h^2 
S&W 


CRSIR 


DFM-5 


ft, = 4 
S&W 


CRSIR 


1. 


GDP Components 


0.905 


0.905 


1.079 


0.907 


0.870 


0.807 


0.906 


0.906 


0.839 


2. 


Industrial Production 


0.882 


0.882 


0.669 


0.861 


0.852 


0.694 


0.827 


0.827 


0.745 


3. 


Employment 


0.861 


0.861 


0.849 


0.861 


0.859 


0.803 


0.844 


0.842 


0.823 


4. 


Unempl. Rate 


0.800 


0.799 


0.771 


0.750 


0.723 


0.723 


0.762 


0.743 


0.647 


5. 


Housing 


0.936 


0.897 


1.220 


0.940 


0.902 


1.081 


0.926 


0.882 


0.807 


6. 


Inventories 


0.900 


0.886 


0.856 


0.867 


0.867 


0.764 


0.856 


0.856 


0.784 


7. 


Prices 


0.980 


0.970 


0.865 


0.977 


0.961 


0.892 


0.963 


0.948 


0.797 


8. 


Wages 


0.993 


0.938 


0.967 


0.999 


0.919 


0.960 


1.019 


0.931 


1.031 


9. 


Interest Rates 


0.980 


0.946 


0.849 


0.952 


0.928 


0.892 


0.956 


0.949 


0.822 


10 


. Money 


0.953 


0.926 


1.000 


0.933 


0.921 


0.950 


0.909 


0.909 


0.927 


11 


. Exchange Rates 


1.015 


0.981 


0.974 


1.015 


0.980 


1.108 


1.036 


0.965 


1.150 


12 


. Stock Prices 


0.983 


0.983 


0.840 


0.977 


0.955 


0.893 


0.974 


0.961 


1.039 


13 


. Cons. Exp. 


0.977 


0.977 


0.765 


0.963 


0.960 


1.082 


0.966 


0.955 


0.963 



per. Comparing all these results, CRSIR method has smaller median relative 
RMSEs for more than 70% of these categories among three forecasting period, 
which demonstrates its superiority again. 

Table HJalso indicates the performance of CRSIR varied across categories. It 
has outstanding performance for some categories, such as Industrial Production, 
Unemployment Rate, Inventories, Interest Rates, etc. But it does not 
work well for some others, such as Housing, Money, Exchange Rates. Figure 
m plots six series from both CRSIR favored and no-favored categories. Three 
of them in Figure HI] are from CRSIR favored categories and three of them in 
Figure I4III are from CRSIR no-favored categories. From these plots, one can 
see that the responses of the CRSIR no-favored series are quite disordered. 
They are more like white noises, the variations are big but the changes of x 
means are not distinct. The inverse regression method is aimed to detect the 
variation of E{'x\y). If the conditional expectations of x do not have much 
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difference for different values of y, the estimation for the e.d.r. -directions will 
be inaccurate, and will lead to the poor performance on forecasting. 



4 Conclusion and discussion 



Sliced inverse regression now becomes a popular dimension reduction method 
in computer science, engineering and biology. In this article, we bring it to 
macroeconomic forecasting model when there is a large number of predictors 
and high collinearity. Compared to the classical dynamic factor model, SIR 
retrieves information not only from the predictors but also from the response. 
Moreover, our cluster-based regularized SIR has the ability to handle highly 
collinearity or "T < A^" data. The simulation confirms that it offers a lot of 
improvements over DFM-5 model on the macroeconomic dataset. 

After finding the CRSIR variates, we use linear models for forecasting 
the responses because scatter plots for CRSIR variates and y values show 
strong linear relationships, and the results are desirable. But one may use 
polynomials, splines. Lasso, or some other more advanced regression techniques 
for different situations. 

Based on its basic idea, there are more than one ge neralizations of SIR 
using higher order inverse momei its. For instance, SA VE ( ICook and Weisberg 
(1991)), SIR-II (U (|l991b|)), DR \U and Wane! mm ), and SIMR Jve and Yand 
( I2OIOI )). Our cluster-based algorithm can also be applied to these methods for 
highly collinearity data, and good performance is expected. 

Above all, we can conclude that the cluster-based regularized sliced inverse 
regression is a powerful tool in forecasting using many predictors. It may not 
be limited in macroeconomic forecasting, and can also be applied to dimension 
reduction or variable selection problems in social science, microarray analysis, 
or clinical trails when the dataset is large and highly correlated. 
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Appendix 

Assume the following linearity conditions. 

Linearity Condition 2. For any b G ]R^% the conditional expectation ii^(b'xi|0^*'''xi) 
is linear in d^f'^i, j = 1, 2, . . . , fcj. 

Linearity Condition 3. For any b G M^, the conditional expectation £'(b'x| A'x) 
is linear in A'^^x, A2X, . . . , A^x. 

Linearity Condition 4. For any b G M™, the conditional expectation ii^(b'A'x|r'A'x) 
is linear in 7'j^A'x, 72A'x, . . . , 7^A'x. 

Condition [2] and [3] are satisfied when all the x's have ellip tical symmetr ic 



distribution, especially the multivariate normal distribution (lEatonl ( 1l986l )). 
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Condition H] is also satisfied when all the A'x have elliptical symmetric distri- 
bution, which is true because all the elliptical symmetric distributed x's have 
been standardized to the same scale. 

Li's Theorem[T]can be restated as following for each cluster when E{x.) = 0. 



Theorem 3. ([Xi ^199 id) ) Under Linearity Condition\^ i?(xi|?/) is contained 



in the space spanned by Sxi^j*\ j = 1, . . . , /cj 

Furthermore, it's not hard to see that. 

Corollary 1. Under Linearity Condition\^ E{:K\y) is contained in the space 
spanned by S^Ai, 11x^2, • • • , S^Am- 

Corollary 2. Under Linearity Condition^ E{A.'^\y) is contained in the space 
spanned by Sa'x7i, Sa'x72> • • • > ^a'xTi,- 

Based on the above results, we can conclude that 

Theorem 4. Under Linearity Conditions\^ and\^ E{x.\y) is contained in 
the space spanned by SxAF. 



Proof. ILJ teOOOl ) proved Theorem [H which is the same as Corollary [T] in dif- 



ferent notations, by showing that E{:K\y) can be written as, 

E{^\y) = SxA/ci(y), 

where Ki{y) = {A'J:^A)-^E{A'x\y). 
Similarly, under Condition HI 

E{A'x\y) = SA^xF/tsd/), 

where ^2(1/) = iT'J:A,^T)-^EiT'A'x\y) and Sa'x = A'SxA- 
Therefore, 

E{x\y) = E^A^^{y) = E^A{A'E^A)-'E{A'x\y) 

= SxA(A'SxA)-iA'SxAr/€2(y) 

= SxAr/t2(y). 
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That implies that E{-x.\y) is in the e.d.r. space spanned by ExAF. 

□ 
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